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ABSTRACT 

To correctly analyse data sets from current microwave detection technology, 
one is forced to estimate the sky signal and experimental noise simultaneously. 
Given a time-ordered data set we propose a formalism and method for esti- 
mating the signal and associated errors without prior knowledge of the noise 
power spectrum. We derive the method using a Bayesian formalism and relate 
it to the standard methods; in particular we show how this leads to a change 
in the estimate of the noise covariance matrix of the sky signal. We study the 
convergence and accuracy of the method on two mock observational strategies 
and discuss its application to a currently-favoured calibration procedure. 

Key words: cosmic microwave background 



1 INTRO 

Observations of temperature fluctuations in the Cosmic Microwave Background (CMB) are 
poised to become perhaps the most precise tools to probe cosmological models (e.g., Bond 
& Jaffe 1998). Characteristics of microwave detection technology — in particular, long-time- 
scale noise correlations — make these observations especially c hallenging (pelabrouille 1998| ; 
De Bernardis fc Masi 199| ; |G6rski, Hivon fc Wandelt 1999| ; [Tegmark 19971 ) • Such difficult 
observations — and such high scientific stakes — require equally careful analyses of the data. 



A map of the CMB sky is not enough; detector characteristics will ensure that the noise 
contribution to such a map is highly correlated from pixel to pixel, in a pattern depending in 
detail upon the detector noise spectrum and the observing strategy. But correct estimation 
of the map and the noise is especially crucial for cosmological analyses which are themselves 
trying to measure a variance, the intrinsic power spectrum, Ci of the CMB — mis-estimation 
of the noise can directly add or subtract (i.e., bias) the estimated Ct ( |G6rski 1994a| ; p or ski 
1994T5| ; [Lineweaver 199g ). 



Unfortunately, we usually do not have an independent determination of the detector noise 
power spectrum, at least not one that has been made under the same operating conditions as 
the actual experiment. Moreover, there are noise contributions such as atmospheric emission 
which may have similar characteristics but are completely independent of the instrument. 
For this reason, we prefer to estimate the noise directly from the experimental data, perhaps 
using prior estimates of the noise under different conditions as a guide. In this paper, we 
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discuss how to jointly estimate the microwave sky map and the noise correlation matrix using 
a maximum-likelihood formalism. This brings together two strands of CMB data analysis 
that had previously been seen as separate steps. We also discuss the further use of the map 
and correlation matrix for power spectrum estimation, and the pitfalls that this may incur 
relative to the ideal (but impossible to implement!) full analysis. Although CMB experiments 
have had to address this problem in the past, only cursory descriptions have been made of 
the methods used to estimate the noise and the validity of the approximations ( [Lineweaver 



1994; Devlin 1998) 



In Section g we begin with a discussion of CMB observations and a review of the Bayesian 
formalism for CMB mapmaking and parameter estimation. We extend this formalism to 
new maximum likelihood estimates for both the map and noise correlations. In Section |] we 
discuss an iterative method for determining this maximum likelihood and apply it to a two 
model observational strategies as a test. In Section |] we present the problem of estimating 
the noise for a data stream with large signal-to-noise, namely calibration with the dipole. In 
Section ^| we conclude. 



2 BAYESIAN FORMALISM 

As has become customary, we start our analysis with Bayes' theorem 

P{9\DI) oc P{9\I)P{D\9I) (1) 

where 9 are the parameters we are trying to determine, D is the data, and I is the "back- 
ground information" describing the problem ( |Lupton 1991| ), which we will often omit from 
our probability expressions. The quantity P(D\9I) is thus the likelihood, the probability of 
the data given a specific set of parameters, P(9\I) is the prior probability for the parameters, 
and the left-hand side of the equation is the posterior probability for the parameters given 
the data. 

Here, we will take the data, dj, as given by a time series of CMB measurements, 

d% — Si + 7ij = AipTp + rij, (2) 
p 

where % labels the time, t = i St, Sj and rij are the experimental noise and sky signal 
contributions at that time. The signal is in turn given by the operation of a "pointing 
matrix," A ip on the sky signal at pixel p, T p (i.e., the "map"); we take the latter to be 
already pixelized and smeared by the experimental beam, so A is a very sparse matrix with 
a single "1" entry for each time corresponding to the observed pixel. In what follows we will 
often rely on the summation convention and write J2 P Ai P T p = Ai P T p , or occasionally use 
matrix notation, as in AT, etc. 

We will assume that the observed noise Hi is a realization of a stationary Gaussian process 
with power spectrum N(u>). This means that the correlation matrix of the noise is given by 

(n inil ) = N„ = f P N(u)e-^-^. (3) 

The stationarity of the process requires (or is defined by) Nm = N(ti — U>). 
Most generally, we will take the parameters to be 

• The observed CMB signal on the sky, T p ; 

• The power spectrum of the noise, N(u) 

• (Possibly) any cosmological parameters which describe the distribution of the T p (i.e., 
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the CMB power spectrum, Cg, although we could also directly use the cosmological param- 
eters such as H and Q). 

Sometimes, we will assign a prior distribution for the CMB signal on the sky itself (i.e., 
uniform in the sky temperature) such that the cosmological parameters (or Cg) will be 
irrelevant — we will see that this doing this as an intermediate step retains all of the infor- 
mation in the dataset. At other times we will marginalize over the CMB signal itself and 
determine those parameters. 

With these parameters and the data, di, Bayes' theorem becomes 

P[T p ,N(w),Ci\di,I] oc P[N(u)\I\P(T p ,C t \I) 

xP[di\N(uj),T p ,I]. (4) 

Here, we have used two pieces of information to simplify slightly. First, the noise power 
spectrum, N(lu) does not depend at all on the signal, so we can separate out its prior 

distribution.^ Second, given the noise power spectrum and the sky signal, the likelihood 
does not depend upon the cosmological parameters. For the Gaussian noise we assume, the 
likelihood is simply 

-21n£ = -2ki P[di\N(u),T p ) 

= In \N W \ + J2( d * ~ s i) N ^{ d i' - 

ii' 

= J2[^N k + \d k -s k \ 2 /N k ] (5) 

k 

(ignoring an additive constant); recall that Si = J2 p Ai P T p . The second equality uses tildes 
to denote the discrete Fourier transform at angular frequency u) k . 
We will now apply these general formulae to various cases. 



2.1 Known noise power spectrum 

We will start with the simplest case, where we have complete prior knowledge of the noise 
power spectrum. This is the case that has been previously discussed in the literature, but 
we emphasize that it is very unrealistic. 

We assign a delta-function prior distribution to N, transforming it in effect from a 
parameter to part of the prior knowledge. First, we assume no cosmological information 
about the distribution of temperatures on the sky: P(T p ,Cg\I) = P{T p \I)P{C(\I); with 
this separation the posterior for Cg is simply the prior — the experiment gives us no new 
information. We will also assign a uniform prior to T p , lacking further information. Now, 
the posterior distribution for the sky temperature is simply proportional to the likelihood, 
which can be rewritten by completing the square in the exponential as 

P(T p \N,d,I) oc P(d\T p ,N,I) 

1 

oc 



\2ttC Npp i 



x exp 



1 1/2 



(6) 



with the mean (also, the likelihood maximum) given by 



t In principle, the noise may depend on the signal such as in the case of a nonlinear response of the system to large-amplitude 
signals such as planets or the galaxy; in practice data thus contaminated is discarded. 
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T = (A t N- x A) 1 A T N~ l d (7) 
(in matrix notation), and the noise correlation matrix by 

C N = [A T N~ l Ay\ (8) 

Occasionally, the inverse of this correlation matrix is referred to as the weight matrix. As is 
usual for linear Gaussian models, the mean is just the multidimensional least-squares solution 
to d = AT with noise correlation N. This is just the standard mapmaking procedure (Lupton 



|1991| ; flegmark 1996| | Wright 1996 bp , cast into the form of a Bayesian parameter-estimation 



problem. 

For the case of known noise, however, this map is more than a just a visual representation 
of the data. Even if we wish to determine the cosmological parameters, it is an essential 
quantity: we can write the prior for both the map and the spectrum as 

P{C t , T P \I) = P(T p \C e , I)P(C e \I) (9) 

using the laws of probability, and so we can see that our rewriting of the likelihood in the 
form of Eq. |B] remains useful. That is, the full distribution is only a function of the data 
through the maximum-likelihood map, T — in statistical parlance, T is a sufficient statistic. 
Thus for known noise, we can always start by making a map (and calculating its noise 
matrix, C^). 



2.2 Cosmological CMB priors 



Here, we briefly examine the specific form of the signal prior, P(T p \Ce, I), motivated by 
simple Gaussian models. That is, we take the sky temperature, T p , to be an actual realization 
of a Gaussian CMB sky, with covariance specified by the power spectrum, Cg, 



(TpTpi) — Ct, pp ' 



p ' Xpl J 



(10) 



(note that we include beam-smearing by a symmetric beam with spherical harmonic trans- 
form Bi in this definition); x p ■ x p r gives the cosine of the angle between the pixels. With 



this covariance, the prior becomes 



P{T p \C e J) 



1 



\2ttG 



Tpp'\ 



1 1/2 



exp 



9 E T P C T ^ pp ,T p 



(11) 



We thus have a posterior distribution for T p and Ce which is the product of two Gaussians, 
P(Cg,T p \d, I) oc P(T p \Cg, I)P(d\T p , I), given by Eqns. || and |TTJ. In the usual cosmological 
likelihood problem, we don't care about the actual sky temperature per se, but are concerned 
with the Ci (or the parameters upon which the power spectrum depends). Thus, we can 
marginalize over the T p , 

P(C e \d, I) = J dT p P(C £} T p \d, I) = P{Cg\I) J dT p P(T p \Ct, I)P(d\T p , I) 
= P(C e \I)P(f{d)\CgI), 



x exp 



1 



AM 



{ T p C t}pp' T p' + ( T p - T l 



n-l 

^N,pp' 



T p ' — T p i 



pp 



'121 
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where we have included a prior for the power spectrum itself, so we can write the Gaussian 
factor as the likelihood for the map given Ce, P(Ci\d,I) oc P(Ci\I)P(T\Ci). Equation [L2 



defines the effective likelihood for the map (T, now considered as the data rather than the 
timestream itself, d), which is easily computed again by completing the square, giving 



P(T p \CeI) = 1/2 ex P 

\2n (Ct pp > + Cjvpp')| 



pp' 



(13) 



This is just the usual CMB likelihood formula: the "observed map," T p , is just the sum 
of two quantities (noise and signal) distributed as independent Gaussians. Note again that 
the data only enter through the maximum likelihood map, T, although that calculation is 
only implicit in this formula. Further, the power spectrum Cg only enters through the signal 
correlation matrix, Ct, and in a very nonlinear way 

We can also play a slightly different game with the likelihood. If we retain the Gaussian 
prior for the CMB temperature but fix the CMB power spectrum, we can estimate the map 
with this additional prior knowledge. We will again be able to complete the square in the 
exponential and see that T p is distributed as a Gaussian: 

- l x \T p \C^dJ) 



P(T p \C e , d, I) = 1/2 exp 

\2,iiCw\ 



(14) 



with 

X 2 (T p \C e , d, I) = £ (f p - (Wf) p ) C w ) pp , (f p/ - (WT) p/ ) . (15) 
pp' 

Now, the mean is given by 

(Wf) = C T (C T + CVr 1 ? (16) 
which is just the Wiener Filter. It has correlation matrix given by 

Cyy = Ct(Ct + Cn) 1 Ct- (17) 

Note that the maximum-likelihood map, T still appears in these formulae, but it is no longer 
the maximum of the posterior distribution, now given by the Wiener filter, WT. 

This subsection has shown how many of the usual CMB data calculations can be seen 
as different uses of the Bayesian formalism: 

• the least-squares map (seeing it as a "sufficient statistic"), 

• the CMB cosmological-parameter likelihood function, and 

• the Wiener filter map of the CMB signal. 

The differences depend on what quantity is estimated (the map or the power spectrum) and 
what prior information is included. 



2.3 Unknown noise 

The previous subsection briefly outlined the Bayesian approach to CMB statistics, assuming 
a known noise power spectrum. Now, we will relax this assumption and approach the more 
realistic case when we must estimate both the experimental noise and the anisotropy of 
the CMB. We will take as our model a noise power spectrum of amplitude N a in bands 
numbered a, with a shape in each band given by a fixed function Pj, with a width of n a ; 
n a = — is the number of discrete modes in the band a where Am is the width of the band 

and co>o is the minimum frequency of the data-stream. We will usually take Pk = const so N 
is piecewise constant. That is, 
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N(u k ) = N a P k . 



(18) 



We again assign a constant prior to the sky map, T p . As the prior for the noise we will take 
P(N a ) oc 1/Na- With v = 1 and a single band, this is the usual Jeffereys prior advocated 
for "scale parameters" and the units on N a are irrelevant ( [Kendall fc Stuart 1977] ). 

With this model and priors, the joint likelihood for the noise and the map becomes 



p{T p ,N a \d,i) oc n 



n 



Na +na/2 



1 



N, 



exp 



exp 



2N, 



dk -A-kpTp 



~ |2 



2N, 



1 ^ 1 r 



a fcec 



(19) 



where k G a refers to a sum over modes in band a. We also define the estimate of the noise 
as e = d — AT for future use. 

We can simultaneously solve for the maximum-probability noise and signal. Carrying out 
the necessary derivatives gives 



<91n£ 

dT r> 



E 



=v- ^2 — (dk — Akp'T p i^j Ak p 



k£a Pk 



= (d- AT) T N- 1 A = e T N- 1 A 
(switching between indices in Fourier space and matrix notation) and 



<91n£ 

dN a 



1 1 

'2W n 



(n a + 2v) 



1 



E 



i 



dk ~ A^piTpi 



k&a - 1 k 

Setting these to zero and solving, we then find we must simultaneously satisfy 
(using matrix notation for simplicity) and 

2 1^1 



dk — A kp iTpi 



n a + 2u P k 



E 

k£a 



I ~ I 2 
\ £ k\ 



(20) 



(21) 



(22) 



(23) 



Equation |2^ is just the usual maximum-likelihood map solution; Equation |2^ is the average 
"periodogram" (i.e., the naive power spectrum calculated from the Fourier transform) of the 
noise over the band a, with a slight modification for the prior probability, parameterized by 
v] for wide bands this modification is irrelevant. As we will see below, iteration is actually a 
very efficient way to solve Equations ^ and |23] simultaneously. For future reference, we also 
write down the derivative with respect to the map at the joint maximum (i.e., substituting 



Eq. E3 into Eq. EO 



<91n£ 



E 



(n a + 2v) 



Sfcgq tkAkp/ Pk 
Efc'ea \£k'\ 2 /Pk< 



(24) 



If we fix the noise at the joint maximum, then the problem reduces to that of the previous 
section, a Gaussian likelihood in T p for which the usual tools can be applied. This is not 
a rigorous approach to the problem, however due to correlation between noise and signal 
estimation. To get a handle on this, we calculate the curvature of the distribution around 
this joint maximum, which we define as 



Qpp' 

a, 



pa 



Qaa' 



(25) 
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where a subscript a or p refers to a derivative with respect to N a or T p , respectively. T refers 
to the full matrix; Q to the sub-blocks. Explicitly, the parameter derivatives are 

(26) 

and 

d 2 ln£ n a + 2u 

dN Q dN Q , ~ Qaa ' ~ Wf~ 
with the cross-curvature 



<W (27) 



d 2 ln£ 1_ gfcA fc p 



-^E^- (28) 



The distribution about the joint maximum is not a Gaussian in the N a directions, so 
there is more information available than this. Nonetheless, if we treat the distribution as if it 
were Gaussian, we can calculate the covariance matrix, given by the inverse of this curvature 
matrix; if we assume that the matrix I pp i — Q pp iiQp" a < 3aa i< 5a'p' ls invertible, then we find that 
the effective variance of the map is increased from C^ pp i = G pp i to 

Cn pp > = (Cjv\w — GpaGaa'Ga'p') 



1 ^kp 1 l kp' 2 l ~k IS -kp 
AT TJ ^—^ I O,^ ATI TJ ^—^ 



N a ^ a Pk ^ (n a + 2v)Nl^ a P k Py 



(29) 



which we have evaluated at the simultaneous peak of the distribution to determine N a . 

To assess the importance of the correction term we can consider an approximate expres- 
sion that relates (C^f ) _1 to (C^)" 1 ; we shall assume that N(uj) is approximately white, with 
a constant number of frequency samples per bin, i.e., n a = n, P k = 1 for all a, and v — 1. 
The key parameter is then 

r v = — (30) 

L -^u) 1 p 

where r p is the total time spent on a pixel p and A w is the dimensionful width of the bins. 
A simple way of obtaining r p is by reordering the data stream in such a way that all time 
samples for a given pixel are grouped contiguously. Then the minumum frequency for a given 
pixel will correspond to 2tt/t p where r v is now the length of the data stream segment. We 
then have that the total number of discrete modes in bin of width A w is 2tc/ A w t p . The extra 
factor of two comes from equation (|29|) . 

If we take the ensemble average of Eq. ^9| we obtain 

({c$J) _1 > = (i - Vw) £<i> E KK> = (i - y/w) ((Cnpp')- 1 )- (31) 

For an equal amount of time spent on each pixel we have that n = A^Ttot/^ir where T to t = 
Npi x Tp so that Eq. |3l] simplifies to 

((<$J)~ 1 )=(l-^)((Cw)- 1 ). (32) 

This immediately shows us two limits. In the case of one frequency per band, the inverse 
variance is negative and the problem is ill-defined; in some sense one is trying to estimate 
too much from the data set. In the limit of just one band (i.e., n — > oo) the correction for the 
effective variance is negligible. For practical cases one needs enough bins to properly capture 
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the shape of the noise periodogram, so one has to consider some intermediate regime. For 
the case of one pixel and white noise, to get a correction of less than 1% to the C^ pp ^ one 
needs n > 10 2 N P i X . For more realistic cases of many pixels and non- white noise, much larger 
bins are necessary, and must be evaluated on a case by case basis. 

We note a caveat to this procedure: consider an experiment that simply chops with a 
given frequency of rotation in a circle on the sky (as in a simple model for Planck's observing 
strategy). Any source of noise synchronous with this motion will necessarily be very difficult 
to distinguish from sky signal. That is, there will necessarily be strong covariance between 
chop- or spin- synchronous noise and the signal. This will remain somewhat the case even if 
the pattern moves slowly on the sky. In this case, other information, such as the smoothness 
of the noise power spectrum, must be used. 



2.4 Noise Marginalization 



Instead of this joint solution, formally, at least, we know the appropriate procedure: marginal- 
ize over the quantity we don't care about (the noise power spectrum) to obtain the distribu- 
tion for the quantity we wish to know (the map, T p ). We can actually carry out the integral 
in this case: 



P{T p \d h I) 



dN k P(T p ,N k \d h I) 



oc 



n 



p~ d k — AkpTp 



-{n a /2+v-l) 



(33) 



(this is just Student's t distribution in a slightly different form than is usually seen). If we 
stay in Fourier space, the maximum of this distribution can be calculated to be the solution 
of 



<91nP 

dT r> 



E 



2u-2) 



J2k£a p k 


d~k — Ak p iTp>^ 


A hp 




dk' — A k i p iT p i 


2 



(34) 



Note that this is exactly the same form as Eq. |2^, the equation for the maximum probability 
map in the joint estimation case, with the prefactor (n a + 2v) replaced by (n a + 2v — 2). 
This is equivalent to changing the exponent of the prior probability from v to v — 1: the 
marginalized maximum for v = 1 ( Jefferys prior) is the same as the joint maximum for v = 
(constant prior). We have also seen that for n a 3> 1, the value of v is irrelevant, so that 
these maxima should be nearly equal. (Moreover, the numerical tools to calculate the joint 
solution, outlined below, can be used in this case, too.) Note also that if n a + 2u — 2 < 0, the 
equation isn't solved for any map. In this case, either our prior information is so unrestrictive 
or the bands are so narrow that it is impossible to distinguish between noise and signal, and 
the probability distribution has no maximum when the noise is marginalized. 

In principle, any further analysis of the map would have to rely on the full distribution 
of Eq. |33[ In practice, the t-distribution is quite close to a Gaussian although the tails 
are suppressed by a power-law rather than an exponential. It will thus be an excellent 
approximation to take the distribution to be a Gaussian 

P(T p \d u I) r 1 



2irC^ ppl 



1/2 



x exp 



■nEfV T p) ( C N,pp') ( T P' - T P' 



pp' 



(35) 
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with noise covariance given by 



\yNpp' ) 



d 2 In P(T p \di, I) 



dT p dT p , 



E 



2u-2) 



?, 12 



^ P k 



kGa 



(Sfce« \^k 



2v — 2) y-, EkAk p 



kec 



Pk 



E 

fc'eo 



fc'p' 



(36) 



where T is the solution to Eq. |34| and the derivatives are taken with respect to the full 
distribution of Eq. |34|. Note that the inverse of C|f (occasionally referred to as the weight 
matrix) is given by the sum of two terms. The first is just the weight matrix that would be 
assigned given the maximum probability solution for the noise, Eq. ||] (with the numerically 
irrelevant change of the prior v — > v — 1 as noted above). Again we find a correction term 
due to the fact that we are only able to estimate this noise power spectrum. This correction 
term can be minimized by taking sufficiently wide bands, as in the simultaneous estimation 
case above (and subject to the same caveats). 



3 AN ITERATIVE METHOD: CONVERGENCE AND ACCURACY 

In this section we propose an algorithm for finding the Maximum Likelihood Estimate (M LE) 
of the noise and signal and we apply it to two models that contain all the essential features 
of current CMB measurement processes. We parameterize the map and noise as described 
above, in terms of a set of pixel temperatures, T p , and a set of bandpowers, N a . The noise 
is stationary and Gaussian with a periodogram of the form: 

N(u) oc 1 + (37) 
to 

Given some prior knowledge of the position of u^ nee we divide the noise periodogram, N(u), 
into logarithmically equally spaced bins below c^knee and linearly equally spaced bins above 

^knee • 

Equations ^ and |23] are nonlinear and it is therefore impractical to obtain an explicit 
solution for N a and T p . However the structure of this system lends itself to an iterative 
approximation scheme; given the i th estimate the map, Tjf\ we can find the the i th estimate 



of the noise in a band a, N a from Equation |23|. As we shall see, this system will converge to 
the MLE of T p and N a . We note that this algorithm is similar to "Expectation-Maximization" 
(EM) algorithms ( |Moon 1996| ) which alternate between filling in unknown data (as in our 
mapmaking step) and calculating a maximum-likelihood estimation of some quantity (the 
power-spectrum estimation step). 

The algorithm starts with initial conditions: 
T (o) = o 

nL 0) = — ^EMMI 2 - (38) 

The iterative step is 

T« = (A T N^- l A)~ 1 A T N^~ 1 d (39) 
sW(u) = A(uj)T^ 

N® = —^—x E \d(uj)-s^(uj)\ 2 . (40) 

The first model CMB experiment we will analyse is an idealized small sky mapping 
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experiment for which we shall adopt the name fence scan ( |Tegmark 1997] ): for the first half 
of the experiment the beam is swept back and forward horizontally, shifting downwards until 
the entire patch has been covered; for the second half of the scan the beam is swept back and 
forth vertically, shifting horizontally to the left, again until the patch is completely covered. 
The end result is a set of perpendicularly crossed linked scans. The second model scan is a 
small sky version of the original Planck Surveyor scan strategy, where great circles intersect 
at the two poles (we shall call it a poles scan): The beam sweeps horizontally back and forth 
along a square patch of the sky, drifting downwards until the bottom of the patch when it 
reaches the end of the scan; at the edge of each horizontal sweep the beam returns to the 
same pixels (one on the left and one on the right). The crosslinking is then simply at the 
two end pixels (which play the role of the poles in the full-sky, great circle Planck scan). 
We consider a map with 10 x 10 pixels for the fence scan and 8 x 10 + 2 pixels for the 
poles scan. We generate 56000 observations per scan. The noise is generated by a stationary, 
Gaussian number generator with periodogram given by Eq. ^7| and we choose Uk Qee T s =Q.b 
where T s is the duration of one horizontal sweep; we have found this to be typical values in 
the BOOMERanG and MAXIMA experiments. 

The first thing to check is if the algorithm converges. We quantify this by comparing the 
map and periodogram in successive iterations using 

i frW - T( i_1 )l 2 

„i _ v P P ' 

'IT - 



where Nf, is the number of bins that characterize the periodogram and N P i X is the number 
of pixels in the map. The form of the denominator guarantees that spurious null values of 
these quantities won't bias the measure of convergence. In Figure [1] we plot the average of 
200 simulated experiments for three levels of signal-to-noise. 

There are essentially only two parameters which may effect the existence of, and con- 
vergence to, a fixed point: the number of samples per bin, n a , and the signal-to- noise of the 
experiment. As we saw in the previous section, n a must be sufficiently large for C^cfr to 
be positive definite; otherwise the problem is ill-posed. As one would expect from Eq. pn| , 
this is tied to the convergence of the iterative scheme we propose: if the n a are small the 
algorithm either does not converge or takes a prohibitively long time to converge (of order 
10 2 iterations). For simplicity of analysis we consider all linear bins to have the same num- 
ber of samples. We then find that the threshold for both of the experimental strategies is 
n a ~ 2 — 5 x 10 2 ; the number of logarithmic bins between the minimum frequency and iiVee 
are N < = (wknee/wo) (In n a )/n a . For values greater than this, the system has difficulties in 
converging, for smaller values there is very little increase in speed. A few comments are in or- 
der. Firstly in choosing such values of n a we are in a regime in which both the joint estimate 
of (T p , N a ) and the sole estimate of T p using the marginalized likelihood give essentially the 
same results. Secondly one may worry that one is smoothing out important features in the 
noise periodogram, in particular mis-estimating the shape of the periodogram below u>knee 
and smoothing over spectral "spikes" . We have found that in choosing N< logarithmic bins 
below Wknee we adequately resolve the shape while at the same time having enough samples 
per bin to get rapid convergence. Fine structures in the noise periodogram are typically 
wide enough to be resolved with the resolution we propose here; very fine structures are 
usually due to systematic effects and should be taken care of as such. Thirdly, as we saw in 
Section [2.3|, the wider the bins, the smaller the correction to the noise covariance matrix; 
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Figure 1. The convergence of the algorithm for the map, rf T and noise periodogram, rf N for experiments with signal-to-noise 
of 5 (open circles), 1 (closed pentagons) and 0.2 (stars). 

this must be taken into account when subsequently using Cn for estimating C^. For the 
"borderline" values of n a we use for Figure [I] we expect the correction terms to increase CV 
by 50-100%. Larger values of n a for which convergence is slightly more rapid should give 
smaller corrections. 

The second factor we mentioned is the signal-to- noise of the experiment; the figure of 
merit to be used is the signal-to-noise per frequency band in the bands where there is signal. 
Let us clarify what we mean. For the ensemble average signal periodogram we know that 
there will only be a significant signal amplitude for frequencies corresponding to the scan 
frequency and its harmonics. Comparing to the ensemble average noise periodogram, there 
will be two limiting regimes: if the signal-to-noise is small, the noise will dominate and 
the "spikes" of the signal will be subdominant; if the signal-to-noise is large, the signal 
periodogram, at the scan frequency and its harmonics will stand out. The signal-to-noise in 
these bands is naturally related to the signal-to-noise per beam; for the case of white noise 
one has 



S_ 
N 



S 

band N 



(42) 

beam 



where the constant of proportionality is dependent on the characteristics of the experiment 
(see Appendix 0). The larger the S/N of the experiment, the further away the initial estimate 
of the periodogram will be from the ML estimate. Prior expectation of the smoothness of 
N(u) can also be used to distinguish between noise and spikes from the signal, although, as 
mentioned above scan-synchronous noise (which produces similar spikes in the periodogram) 
is very difficult to disentangle from the signal. From Figure [I] we can read off the effect for 
different levels of signal-to-noise. For rf Ty we can see that, regardless of the level of signal- 
to-noise, between 0.2 — 5, the algorithm converges to better than one percent in the first 5 
iterations. For signal-to-noise up to one, the speed of convergence is approximately constant, 
while for high signal-to-noise, after a transient of slow convergence, the map converges to the 
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Figure 2. Results from 200 realization of the fence (a) and poles (b) strategy for signal-to-noise of 1. The dashed line is the 
ensemble average periodogram, open circles are the average of the MLE estimate after 5 iterations and crosses are the average 
of the initial conditions of the periodogram as defined in Equation B8 
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Figure 3. Results from 200 realization of the fence (a) and poles (b) strategy for signal-to-noise of 5. The dashed line is the 
ensemble average periodogram, open circles are the average of the MLE estimate after 5 iterations and crosses are the average 
of the initial conditions of the periodogram as defined in Equation [3§| 

fixed point much faster. The evolution of r] l N is much more sensitive to the signal-to-noise: 
the higher the signal-to-noise the slower the convergence at each iteration. 

Thus far we have shown that the algorithm converges to a fixed point which we know is 
the MLE estimate of T p and N a . It is now important to check how biased such an estimate is, 
i.e., whether for an ensemble of realizations of the noise time series the average of the MLE 
N a is centred on the true ensemble average N a . To do so we have generated 200 realizations 
of the mock experiments for both fence and poles scans. In Figure § we plot the results for 
an experiment with signal-to-noise per beam of order unity. The open circles are the MLE 
of the N a after 5 iterations of our algorithm, and as we see match up with the ensemble 
average noise periodogram (dashed lines) for both the poles and fence scans. Moreover we 
find that the initial estimate of the noise periodogram, (crosses), is also a good estimate 
of the N a at this level of signal-to- noise. 
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Figure 4. Results from 200 realizations of the fence (a) and poles (b) strategy for signal-to-noise of 1 and fence (c) and poles (d) 
strategy for signal-to-noise of 5. The dashed line is the ensemble average periodogram, open circles are the average of the MLE 
estimate after 5 iterations and crosses are the average of the initial conditions of the periodogram as defined in Equation |4.| 



Naturally, for higher signal-to-noisejDer beam, the initial estimate is not that good. In 
Figure ^| we plot the ensemble average N(u) as a dashed line, as crosses and the MLE 
of N a as open circles for experiments with signal-to- noise ofjj. In the high signal-to- noise 
range of the periodogram, which corresponds to to > c^knee, is a very bad estimate of 
N a ; one is misinterpreting signal as noise. We clearly see, however the the MLE estimate 
of N a is centred on the N(u), a result which is not surprising given what we found in our 
analysis of the convergence of the algorithm. 

A possible alternative to initial conditions, T^°> and iV^ ) , of Equation |3^ is to start with 
the raw map: 

« = (A T A)-'A T d. (43) 

Indeed in the absence of low frequency, 1/u noise, this is the MLE solution to the map. We 
have regenerated our monte carlo results using these initial conditions and present them in 
Figure ^. The first thing to note is that the MLE of N a is centred on the ensemble average 
N(u) just as in the original choice of iV^. Again this is evidence that the MLE fixed point is 
a strong attractor of the iterative algorithm, and relatively insensitive to initial conditions. 
In addition we find that iV^ ) constructed from the raw map estimate of the signal is a bad 
estimator for N a in the low frequency range. This is true in both low and high signal-to-noise 
experiments given that at low frequencies the noise component of the periodogram always 
dominates: the raw map misrepresents the low frequency noise as signal and subtracts it 
away from the data stream as such. 



4 DIPOLE CALIBRATION 

In this section, we apply the formalism developed in the earlier sections to a somewhat 
different problem: the calibration of anisotropy experiments off the CMB dipole. Nowadays, 
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many anisotropy experiments have a "spinning mode" when they rotate in a circle of constant 
elevation on the sky ( [Lee et at 1999| ). (Over short periods of time, this scan is approximately 



a scan at constant "latitude" relative to some fixed pole on the sky) The circle is chosen to 
have sufficiently large opening angle such that the measurement is sensitive to large-angle 
CMB anisotropy. The amplitude of the CMB dipole is sufficiently high that the data for 
these scans is dominated by the modulation of the dipole around the circle. In this case, the 
data are given by 

di = rii + A ip (a x x p + T P + a 2 g p ) (44) 

where and T p are noise and anisotropy signal as before, and x p is the dipole pattern at 
pixel p, which has a well-measured amplitude and well-understood pattern. The constant 
of proportionality for the dipole, ai, is then the calibration of the detector. We also allow 
contamination from the Galaxy at that pixel, a 2 g p , where a<i allows an extrapolation from 
the frequencies where the Galaxy is well-measured to those observed by the experiment 
under consideration. (Of course, if we take the signal, T p , or the Galaxy, g p , to be in real 
temperature units, they too must include the calibration parameter.) 

We wish to basically apply the same rationale as before: determine a along with the other 
parameters, or ideally determine a marginalizing over the other parameters. For the purposes 
of dipole calibration, we can ignore the intrinsic CMB contribution, or, for a particular power 
spectrum, include it in the noise (because of the circular nature of the scan, the isotropy of the 
CMB translates to an effective stationarity for circular scans, so the CMB signal contribution 
could easily be included). Because the signal-to-noise of the dipole measurements is so high, 
it hardly matters. We can thus rewrite the above equation in a suggestive form: 

di = ni + B i0 a + B a ai = v { + B^aj (45) 

where B i0 = A ip x p and Bn = A ip g p . In this form, we immediately see that we can use the 
techniques developed above to estimate {a ,ai}. For known noise, the best-fit calibration 
would be 

a t = (B T N' 1 B);'(B T N- 1 d) ti (46) 

In fact, if our model of the Galaxy were perfect, we could just set a = a\ = a 2 and 
estimate that single parameter: 

a = [(B i0 + BafN-^Bm + B a )] 1 (B l0 + B lX ) T N- l d (47) 

where now there are no matrix manipulations (except for N^ 1 ). However, our extrapolation 
of galactic models and observations remains sufficiently uncertain that it is probably best to 
wholly ignore the data within the galactic plane, which amount to a small number of pixels 
in a dipole scan. 

With unknown noise, we can again iterate to simultaneously determine the noise power 
spectrum and the calibration. However, there is one significant difference in this case: because 
the dipole scans describe nearly perfect circles on the sky, the dipole signal is nearly sinusoidal 
(and the Galaxy signal, although complicated, is nearly periodic with the same period). 
Thus, most of the signal power will be located in a very narrow frequency bin (or harmonics 
thereof) and will have much higher signal-to- noise than the CMB anisotropy. This has several 
practical effects. Outside of this bin, the noise power can be determined by simply taking 
the periodogram of the data. If we expect the noise power spectrum to be smooth compared 
to the width of the dipole spike, we may be able to estimate the noise outside of the spike 
and smoothly extrapolate into the region shared with the dipole. In this case, we expect 
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that the determination of the noise and calibration to be nearly independent, and so the 
marginalization exercise above should prove superfluous (as should any need for iteration). 

The simplicity of this result is somewhat compromised by the presence of the galactic 
signal. To a very good approximation, we can assume that the contribution from the Galaxy 
is zero except for a small number of pixels where it overwhelms the rest of the data, and again 
that the dipole signal is much larger than the anisotropy signal. As our models of moderate- 
latitude galactic emission improve, this information can be incorporated. But because of 
the known angular frequency of the dipole signal, it is safe to remove any part of the data 
possibly contaminated by galactic emission. 

In this case, the data are 

where we also take 

(n 2 ) — > oo, where 6{ G galaxy. (49) 

The formally infinite noise in these pixels just amounts to ignoring those rows and columns 
in matrix manipulation. 

How does this method compare with more traditional methods of dipole calibration? 
Rather then encompass all of the noise into a power spectrum, it is customary to split the 
noise into 

Hi = Wi + (a + hi) (50) 

where a and b are constant over short segments of the data, and Wi is a white noise com- 
ponent. That is, the noise is described as an offset, a linear drift, and white noise, over 
sufficiently short timespans. The amplitude of the noise, a 2 = (w 2 ) is estimated and a least- 
squares fit to a, b and the calibration is done on the short segments and combined at the end 
of the process. This can be complicated somewhat by marginalizing over the (a, b) in each 
segment and finding a single, global calibration and errors. To the extent that this is a good 
model for the noise, this procedure should find correct results. Note, though, that Eq. [5D] is 
not a completely general description of noise while our result simply assumes Gaussian and 
stationary noise. 

Of course, this procedure can be extended to deal with any sort of "template," that is, 
signals of known shape but unknown amplitude. Such amplitudes can be estimated or even 
marginalized over in the usual way for Gaussian likelihoods. Here, though, we present it as 
a superior way to account for low-frequency noise when calibrating experiments. 



(48) 



5 DISCUSSION & CONCLUSIONS 

In this paper we have addressed the task of correctly estimating noise in current CMB 
experiments. Our main points and conclusions are: 

(i) Long term, correlated noise is ubiquitous in current CMB experiments. The depen- 
dence of the noise characteristics on the specifics of the actual observational conditions make 
it essential to devise an algorithm which can estimate the noise from the data stream itself. 

(ii) Under the assumption of stationarity and Gaussianity of the noise we present the 
Maximum Likelihood estimates of both signal and noise from a data stream. These estimates 
are solutions to a coupled set of nonlinear equations which must be solved iteratively. 

(iii) Given that we are now jointly estimating the signal and noise, there will be correla- 
tions between the map and the pixel noise covariance matrix. The standard expression for 
the pixel noise covariance matrix is now modified by a correction term which is effectively 
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proportional to the number of pixels and inversely proportional to the width of the bands 
one is using to characterize the noise periodogram. 

(iv) If we forgo estimating the noise periodogram and marginalize over it, we obtain an 
estimate for the map and its covariance which is effectively equivalent to the joint estimate 
of map and noise in the limit of wide bands. 

(v) We show that, for wide enough bands, the iterative method converges quickly to the 
fixed point. For convergence better than a few percent, this can be achieved in 4-6 iterations, 
for S/N ~ 0.2 - 5. 

(vi) An analysis of two standard observational strategies shows that our estimate is un- 
biased. We show that the two conventional alternative methods are biased for high signal- 
to-noise. 

(vii) It is conventional wisdom that an ideal observational strategy has signal-to-noise of 
unity ( [Tegmark 1997| ). We show that, in this regime, a good approximation to the estimator 
of the noise periodogram is the banded periodogram of the data stream. Thus one is able to 
avoid the time consuming process of performing multiple iterations. 

A few additional comments should be made: 

(i) For Gaussian theories one can by-pass the map estimation step altogether, constructing 
an iterative algorithm that estimates N(u) and Ce directly. This sensibly puts the spatial 
power spectrum of the signal and the temporal power spectrum of the noise on the same 
footing. A priori, however, the computational expense of such an algorithm is prohibitive; 
however, it is possible to simplify the matrix manipulations exploiting the stationarity of 
the noise and the isotropy of the signal. 

(ii) We have not addressed the computationally feasibility of our method; it is well known 
that for data sets of current CMB experiments and the expected megapixel datasets there 
is a serious problem with actually performing all the matrix manipulations ( [Borrill 1998| ; 



Delabrouille 1998| ; |G6rski, Hivon fc Wandelt 1999| ). In the case of our method these problems 



are of course worse: one must perform these matrix manipulations multiple times. We have 
limited ourselves to implementing our procedure assumimg the limitations of the standard 
map making procedure (we do not think that there is an alternative more efficient but ac- 
curate methods which can be applied for large data sets) and are therefore at the limit of 
computational capability with regards to data sets from experiments such as BOOMERanG 
and MAXIMA. A possible approach is to make local (in time) estimates of the noise pe- 
riodogram by restricting oneself to subsections of the time stream which are long enough 
to show the long term correlations of the noise but small enough to be computationally 
tractable. This approach is indeed justified in some of the current experiments which have 
'AC coupled detectors" i.e., whose signal is high-pass filtered by the hardware. 

(iii) We have also not addressed the problem of "gaps" in the data i.e., the fact that 
sections of the data may be contaminated with, for example, cosmic rays, high amplitude 
galactic emission, fast detector transients, etc. There exist a variety of techniques for esti- 
mating the Fourier transform of unevenly sampled data, from the more conventional ones 
like the Lomb periodogram flPress et al 1993| ) to advanced multitaper techniques ( [Komm ~e~\ 
al 1999| ). For many data sets, only a small fraction of the time series is in gaps; a simple 



filling-in procedure followed by a standard FFT periodogram will be adequate. 

(iv) In Section ^ we have only considered spatial templates (such as the Galaxy or the 
dipole). However, many CMB experiments suffer from the presence of signals which have a 
well defined structure in time ( [Kogut 1996|) . Examples of such signals are detector variations 



due to the mechanical motion of the experimental apparatus or microphonic contaminants 
with well defined frequency and phase. Once again this can be dealt with in the way de- 
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scribed in Section [| albeit with obvious modifications to Eqs. [45] and ^ to include temporal 
templates. 

(v) For ground-based and balloon-borne experiments an essential problem is atmospheric 
contamination ( pe Bernardis fc Masi 1998|) . The template is now spatially and temporally 
varying and therefore quite difficult to characterize in detail. A possible way of dealing with 
this contaminant is to have a channel dedicated to measuring atmospheric emission at high 
frequency during the time of observation; one then uses the data stream of this channel as 
a template. Of course this is an open problem and must be dealt with carefully in current 
CMB experiments. 

The techniques we have described here will be applied to the MAXIMA ([Lee et al 1999| ) 
and BOOMERanG ( Pe Bernardis 1999| ) experiments. 
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APPENDIX A: ANALYTICAL MODEL OF GREAT CIRCLE SCAN 

One would like to relate the signal-to-noise in the map to the signal-to-noise in the time 
series. To do so we will work out a simple analytical model of a one dimensional experiment 
( pclabrouille, Gorski fc Hivon 1998| ). We shall consider a great circle scan i.e., choosing 
x p = {dp-, <Pp) with 9 = 7r/2 and < <ft < 2tt. If we assume a scan speed <p — 2vr/T s we can 
define the pointing matrix to be 

A ip = 5[(i5t<p)mod(2n) - (j) p \. (Al) 
The periodogram of the data stream is 



2 



2 



d(uj k ) = A kp T p +N(u k ) (A2) 
where 



27r sin'^^ 



4, — <P r i{n B -l)™ k / <j> i<t> v uj k /<t> ( A1\ 



and we have assumed that the total time of the scan is T = n s T s , corresponding to n s 
consecutive scans of the great circle^ 

As we can see from Figure [AT], | A^ 2 is highly peaked at multiples of the scan frequency 
w s = 2n/T s (with peak amplitude (n s T s ) 2 ); this means that the signal in the time series is 
concentrated at these frequencies. A natural measure of the signal-to-noise in the time series 
is to compare the integrated power in signal for a band around a multiple kuj s of the scan 
frequency with the integrated power in noise within the same frequency band, i.e., 

_stHt:Mm^)) 



s_ 

N 



(A4) 



Naturally for arbitrary noise and signal, this ratio will depend on the multiple of the scan 

S_\ 

N I 



frequency one is looking at. For example if the noise has al/w the 4 1 band will be much 
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smaller at low u than at high u. Let us consider a simple case where the comparison is 
independent of k: (T p T p /) = a\b pp i and N tt i = a%5u'- We then find that (looking at k — 0): 

(fc+l/2)u; s 

1 '"s- L s u T 



duj(\s\ 2 (uj)} = 2im s T,o% 



(k-l/2)u. 

(k+l/2)u> s 

doj(N(oj)) = 2na 2 N . (A5) 

{k-l/2)ui s 



The signal-to-noise in a given band is then 
S n s T s a\ S 

band (T 2 N N 



N 



(A6) 

beam 
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